Motivation

  • New York City is known for its concrete jungle, but there is much more to explore regarding green spaces.
  • Throughout history, many disinvested areas found gardens as a sense of community following the 1970s.
  • Very little has been explored with community gardens and other factors such as economic and health conditions. Our goal was to further understanding these relationships.

Questions

Our initial question was to explore how gardens were distributed throughout the city and how these gardens affect the people in each community. As we moved through the project, we solidified specific questions and this website now answers four main questions. We wanted to group specific related variables by theme, hoping to make the information easier to digest for the reader. For example, information about poverty, budgets and rent burden is all grouped in the economic investment page.

  1. How are community gardens distributed throughout the city?
  2. How are the locations of gardens related to certain demographic characteristics?
  3. What is the relationship between community gardens and economic investment?
  4. What is the relationship between community gardens and select health conditions?

Data

Data Sources

Data Cleaning

Each dataset that was obtained was first individually cleaned. Most were already well organized, and unnecessary variables were removed for efficiency and clarity. Average variables were calculated in these steps as well.

# Cleaning and tidying primary NYC garden data
gardens = 
  read_csv("./data/nyc_community_gardens.csv") %>% 
  janitor::clean_names() %>% 
  mutate(
    community_board = replace(community_board, community_board == "N/A", NA)
  ) %>% 
  drop_na(community_board) %>% 
  relocate(community_board) %>% 
  select(-prop_id,-census_tract, -bin, -bbl, -nta, -council_district, -cross_streets, -jurisdiction, -postcode)

# Cleaning and tidying demographic data
demo = 
  read_xlsx("./data/health_profiles.xlsx",
  sheet = "CHP_all_data", skip = 1) %>% 
  janitor::clean_names() %>% 
  slice(-(1:6)) %>% 
  slice_head(n = 59) %>% 
  mutate(
    id_spatial = id, 
    id = str_replace(id, "^1", "M"),
    id = str_replace(id, "^2", "X"),
    id = str_replace(id, "^3", "B"),
    id = str_replace(id, "^4", "Q"),
    id = str_replace(id, "^5", "R"),
    gentrification = replace(gentrification, gentrification == "n/a", NA),
    avertable_death = replace(avertable_death, avertable_death == "n/a", NA),
    avertable_death = replace(avertable_death, avertable_death == "^", NA)
  ) %>% 
  rename(community_board = id, not_complete_hs = edu_did_not_complete_hs, hs_some_college = edu_hs_grad_some_college, college_higher = edu_college_degree_and_higher) %>% 
  select(id_spatial, community_board:age65plus, on_time_hs_grad, not_complete_hs, hs_some_college, college_higher, poverty, rent_burden, obesity, hypertension, life_expectancy, self_rep_health)

# Cleaning and tidying property data
property = 
  read_csv("./data/property_values.csv") %>% 
  janitor::clean_names() %>% drop_na(community_board) %>% 
  rename(old_community_board = community_board) %>% 
  mutate(
    boro_letter = case_when(
      borough == "MANHATTAN" ~ "M",
      borough == "BRONX" ~ "X",
      borough == "BROOKLYN" ~ "B",
      borough == "QUEENS" ~ "Q",
      borough == "STATEN IS" ~ "R"
    ),
    board_num = ifelse(nchar(old_community_board) == 1, paste0(0, old_community_board), old_community_board)
  ) %>% 
  unite("community_board", boro_letter, board_num, sep = "") %>% 
  relocate(community_board, old_community_board, borough) %>% 
  select(community_board, original_market_value, revised_market_value)

# Grouping property data by community board to prepare for merging.
property_tidy = 
  property %>% 
  group_by(community_board) %>% 
  summarize(
    avg_org_value = mean(original_market_value, na.rm = TRUE),
    avg_rev_value = mean(revised_market_value, na.rm = TRUE)
  )

# Cleaning and tidying budget data
budget = 
  read_csv("./data/budget_tracker.csv") %>% 
  janitor::clean_names() %>% drop_na(community_board) %>% 
  rename(old_community_board = community_board) %>% 
  mutate(
    boro_letter = case_when(
      borough == "MANHATTAN" ~ "M",
      borough == "BRONX" ~ "X",
      borough == "BROOKLYN" ~ "B",
      borough == "QUEENS" ~ "Q",
      borough == "STATEN IS" ~ "R"
    ),
    board_num = ifelse(nchar(old_community_board) == 1, paste0(0, old_community_board), old_community_board)
  ) %>% 
  unite("community_board", boro_letter, board_num, sep = "") %>% 
  relocate(community_board, old_community_board, borough) %>% 
  select(community_board, vote_year, total_appropriated)

# Grouping budget data by community board to prepare for merging and calculating the average budgets for each community board.
budget_tidy = 
  budget %>% 
  group_by(community_board) %>% 
  summarize(
    avg_tot_appropriated = mean(total_appropriated, na.rm = TRUE)
  )

Afterwards, we merged all datasets into one tidied dataset.

final_tidy1 =
  left_join(demo, gardens, by = "community_board") %>% 
  left_join(., property_tidy, by = "community_board") %>% 
  left_join(., budget_tidy, by = "community_board")

# Calculating number of gardens in each community board
num_gardens <-
  final_tidy1 %>% 
  drop_na(garden_name) %>% 
  group_by(community_board) %>% 
  summarize(garden_num = n())

unique_comboard <-
  final_tidy1 %>% 
  distinct(community_board, .keep_all = TRUE)

# Complete merging
final_tidy <-
  left_join(unique_comboard, num_gardens, by = "community_board") %>% 
  replace_na(list(garden_num = 0)) 

final_extra <-
  left_join(final_tidy1, num_gardens, by = "community_board") %>% 
  replace_na(list(garden_num = 0))

write_csv(final_extra, "./data/final_extra.csv")
write_csv(final_tidy, "./data/final_df.csv")
write_csv(final_tidy1, "./data/final_df_main.csv")

The final tidied dataset contains 59 observations (the number of community boards in NYC) and consists of the following 36 variables:

  • id_spatial. Unique spatial ID for each board.
  • community_board. Community board location of each garden.
  • borough. Borough location of each community board.
  • name. Neighborhood name location of each community board.
  • overall_pop. Overall population number in each community board.
  • race_white. Percent self-reporting non-hispanic white.
  • race_black. Percent self-reporting non-hispanic black.
  • race_asian. Percent self-reporting non-hispanic asian.
  • race_latino. Percent self-reporting hispanic/latino.
  • race_other. Percent self-reporting other (American Indian, Native Hawaiian, Pacific Islander, two or more races).
  • age0to17. Percent age between 0 and 17 years.
  • age18to24. Percent age between 18 and 24 years.
  • age25to44. Percent age between 25 and 44 years.
  • age45to64. Percent age between 45 and 64 years.
  • age65plus. Percent age 65 years and older.
  • on_time_hs_grad. Percentage of public high school freshman from the 2013-2014 school year who graduated in 4 years
  • not_complete_hs. Percentage of adults ages 25 and older whose highest level of education is less than a high school diploma or GED.
  • hs_some_college. Percentage of adults ages 25 and older who have a high school diploma or a high school diploma and some college.
  • college_higher. Percentage of adults ages 25 and older who obtained an educational degree above a High School diploma (Associate’s, Bachelor’s, or Graduate or professional degree).
  • poverty. Percentage of residents living below 100% of New York City’s calculated poverty threshold based on income and necessary expenses.
  • rent_burden. Percentage of renter-occupied homes whose gross rent (contract rent plus estimated average monthly cost of utilities including electricity, gas, and water and sewer) is equal to or greater than 30 percent of household income in past 12 months.
  • obesity. Percentage of adults ages 18 and older who have obesity (Body Mass Index of 30 or greater) based on self-reported height and weight.
  • hypertension. Percentage of adults ages 18 and older who report ever being told by a healthcare professional that they have hypertension, also known as high blood pressure.
  • life_expectancy. Life expectancy at birth.
  • self_rep_health. Percentage of adults ages 18 and older who report their overall health is “excellent,” “very good” or “good” on a scale of excellent, very good, good, fair or poor.
  • boro. Borough location of each community board.
  • garden_name. Name of garden.
  • address. Address of garden.
  • size. Size of garden in acres.
  • neighborhood_name. Neighborhood name location of each community board
  • latitude. Latitude location of garden.
  • longitude. Longitude location of garden.
  • avg_org_value. Average original market value.
  • avg_rev_value. Average revised market value.
  • avg_tot_appropriated. Average budget allocated to each community board.
  • garden_num. Number of gardens in each community board.

Exploratory Analysis

Our exploratory analysis is divided into 3 main parts. Garden distribution focuses on the geographic location of the gardens, the number of gardens in each community board, and also includes demographic information (age, education, race) for each board and borough. Health conditions provides a visual comparing number of gardens to percent obesity, hypertension, life expectancy, and self reported health. Economic investment shows a visual comparison between budget allocated to each board and market value, poverty percent and rent burden percent.

Garden Distribution

To explore the distribution of gardens in NYC, we first created a leaflet showing each garden as well as the outline of each community board.

# Read in tidied dataset.
garden =
  read_csv("./data/final_df.csv") %>%
  group_by(community_board) %>%
  select(id_spatial, garden_num, community_board, borough, overall_pop, race_white:college_higher, garden_name:longitude)

garden_all = 
  read_csv("./data/final_df_main.csv")

# Create leaf icons for leaflet map.
leafIcons <- icons(
  iconUrl = "http://leafletjs.com/examples/custom-icons/leaf-green.png",
  iconWidth = 10, iconHeight = 20
)

# Prepare for leaflet outline.
working <- getwd()
com_board_spdf <- readOGR(dsn = working, layer = "community_board_new", verbose = FALSE)
invisible(names(com_board_spdf))
data_spatial <- merge(com_board_spdf, garden, by.x = "boro_cd", by.y = "id_spatial")

# Add popup labels for map.
garden_all = 
  garden_all %>%
     mutate(
      click_label =
        str_c("<b>", garden_name, "</b><br>", borough, "</b><br>", size, " acres"))

# Prepare the text for labels:
mytext <- paste(
    "Borough: ", data_spatial@data$borough,"<br/>",
    "Community Board: ", data_spatial@data$community_board,"<br/>", 
    "Number of Gardens: ", data_spatial@data$garden_num, "<br/>", 
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by number of gardens in the community district
leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons(data = data_spatial,
    fillOpacity = 0, 
    smoothFactor = 0.5, 
    stroke = TRUE, 
    color = "black", 
    weight = 1,
    label = mytext,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>% 
    addMarkers(data = garden_all, ~longitude, ~latitude, icon = leafIcons, popup = ~click_label)

We also created a graph showing the distribution of gardens by number.

We then created tables showing an overview of demographic information allows the reader to gain a better understanding of the landscape of New York City.

Age Distribution

# Creating age tables
age_tables =
  garden_all %>% 
  select(borough, community_board, age0to17:age65plus) %>% 
  distinct()

# Formatting column names
colnames(age_tables) = c("Borough", "Community Board", "0 to 17 years", "18 to 24 years", "25 to 44 years", "45 to 64 years", "65 plus")

# Interactive table including averages
reactable(age_tables, groupBy = "Borough",highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE, 
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"), 
    defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  'Community Board' = colDef(),
  "0 to 17 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "18 to 24 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "25 to 44 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "45 to 64 years" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "65 plus" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

Education Distribution

# Creating education table
edu_tables =
  garden_all %>% 
  select(borough, community_board, on_time_hs_grad:college_higher) %>% 
  distinct()

# Additional order formatting
edu_tables = 
  edu_tables %>% 
 relocate(not_complete_hs, .after = community_board)

# Creating column names
colnames(edu_tables) = c("Borough", "Community Board", "Less than High School", "High School", "Some College", "College or higher")

# Creating interactive table and averages
reactable(edu_tables, groupBy = "Borough", highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE,
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"), defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  'Community Board' = colDef(),
  "Less than High School" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "High School" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "Some College" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  "College or higher" = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

Race Distribution

# Creating race table
race_tables =
  garden_all %>% 
  select(borough, community_board, race_white:race_other) %>% 
  distinct()

# Formatting column names
colnames(race_tables) = c("Borough", "Community Board", "White", "Black", "Asian", "Latino", "Other")

# Creating interactive table and averages
reactable(race_tables, highlight = TRUE, searchable = TRUE, striped = FALSE, fullWidth = FALSE, showSortIcon = FALSE, 
          theme = reactableTheme(
    borderColor = "#89C281",
    highlightColor = "#C0DCBC"),
          groupBy = "Borough", defaultColDef = colDef(
      align = "left",
      minWidth = 100),
    columns = list(
  "Community Board" = colDef(),
  White = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Black = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Asian = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Latino = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%")),
  Other = colDef(aggregate = "mean", format = colFormat(digits = 2, suffix = "%"))))

The map illustrates that there are many gardens distributed around the city, with more gardens in Brooklyn, Queens and the Bronx. Additionally, we can see that the community boards are sectioned in fairly similar sizes.

The age table shows that the distribution in age across the five boroughs are similar, with the majority of people being 25 to 44 years old. However, there seems to be less children living in Manhattan when compared to the other boroughs. In terms of education, there is a larger difference. It appears that most people living in New York City have a high school education, with percentages tapering quickly moving towards college. Manhattan has the highest percentage of a college education or higher (65%), which is much higher than all other boroughs, more than doubling the percentage in the Bronx. Race also appears to be highly divided. While the majority of people in Staten Island are white, the majority of people in the Bronx are Latino. Manhattan looks to be most evenly distributed, showing the difference in racial makeup for each geographical location.

This information serves as a foundation for the other analyses to build off of.

Health Conditions

For health conditions, we selected a few diseases and other measures of health that have been associated with green spaces, being outdoors and have shorter term effects. As a result, we focused on obesity, hypertension, life expectancy and self-reported health. In order to allow for comparisons with number of gardens, we created choropleths for each health condition to allow for side by side comparisons.

# Creating subset of tidied data for health exploration
health_data_final <-
  read_csv("./data/final_df.csv") %>% 
  select(community_board, borough, obesity, hypertension, life_expectancy, self_rep_health, garden_num, id_spatial) 

working <- getwd()
com_board_spdf <- readOGR(dsn = working, layer = "community_board_new", verbose = FALSE)
invisible(names(com_board_spdf))
health_data_spatial <- merge(com_board_spdf, health_data_final, by.x = "boro_cd", by.y = "id_spatial")
# Results from spatial regression or cluster identification to assess the relationship between these variables and community gardens

# Create a color palette for the obesity map:
pal2 <- colorNumeric("BuPu", health_data_spatial@data$obesity)

# Prepare the text for obesity labels:
mytext2 <- paste(
  "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Percent Obesity: ", health_data_spatial@data$obesity, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by percent obesity in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", obesity)(obesity), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext2,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal2, values = ~obesity, opacity = 0.9, title = "Percent Obesity", position = "topleft" )
# Create a color palette for the hypertension map:
pal3 <- colorNumeric("BuPu", health_data_spatial@data$hypertension)

# Prepare the text for hypertension labels:
mytext3 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Percent Hypertension: ", health_data_spatial@data$hypertension, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by percent hypertension in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", hypertension)(hypertension), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext3,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal3, values = ~hypertension, opacity = 0.9, title = "Percent Hypertension", position = "topleft" )
# Create a color palette for the life expectancy map:
pal4 <- colorNumeric("BuPu", health_data_spatial@data$life_expectancy)

# Prepare the text for life expectancy labels:
mytext4 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Life Expectancy: ", health_data_spatial@data$life_expectancy, "<br/>", 
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by life expectancy in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", life_expectancy)(life_expectancy), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext4,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal4, values = ~life_expectancy, opacity = 0.9, title = "Life Expectancy", position = "topleft" )
# Create a color palette for the self reported health map:
pal5 <- colorNumeric("BuPu", health_data_spatial@data$self_rep_health)

# Prepare the text for self reported health labels:
mytext5 <- paste(
    "Borough: ", health_data_spatial@data$borough,"<br/>",
    "Community Board: ", health_data_spatial@data$community_board,"<br/>", 
    "Self Reported Health: ", health_data_spatial@data$self_rep_health, "<br/>",
    "Number of Gardens: ", health_data_spatial@data$garden_num, "<br/>",
    sep = "") %>%
  lapply(htmltools::HTML)
 
# Final Map with shaded colors by self reported health in the community district
leaflet(health_data_spatial) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  setView( lat = 40.7, lng = -74 , zoom = 10) %>%
  addPolygons( 
    fillOpacity = 0.8, 
    smoothFactor = 0.5, 
    fillColor = ~colorQuantile("BuPu", self_rep_health)(self_rep_health), 
    stroke = TRUE, 
    color = "black", 
    weight = 0.3,
    label = mytext5,
    labelOptions = labelOptions( 
      style = list("font-weight" = "normal", padding = "3px 8px"), 
      textsize = "13px", 
      direction = "auto"
    )
  ) %>%
  addLegend( pal = pal5, values = ~self_rep_health, opacity = 0.9, title = "Self Reported Health", position = "topleft" )

We can see that obesity and hypertension are highly correlated, with a higher percentage of adults being obese and having hypertension in the Bronx and parts of Brooklyn. On the other hand, Manhattan has a much lower percentage of adults being obese and having hypertension. This relationship is further confirmed with life expectancy and self-reported health. Life expectancy at birth is highest for those in Manhattan and parts of Queens. Further, the highest percentage of adults rating their health as “excellent”, “very good” or “good” lies in Manhattan and parts of Brooklyn. Therefore, it appears that adults living in Manhattan are the “healthiest”.

Economic Conditions

To explore economic conditions, we sought to select variables that allowed for an understanding of economic conditions of the people as well as the monetary support the neighborhoods received for project investments. From the demographic dataset, property value dataset and the budget tracker, we looked at budget allocation for each community board, average market value, poverty, and rent burden.

In order to visually analyze the budget each community board receives from the city, we created a leaflet map showing the location of gardens and the money the community board receives as a budget for projects. Many of the boards do not receive any budget or do not have available data.

# Creating dataset for property values with applicable info
property_data1 =
  read_csv("./data/final_df_main.csv") %>%
  select(community_board, borough, overall_pop, poverty, rent_burden, avg_org_value, avg_rev_value, avg_tot_appropriated, latitude, longitude, garden_name) %>% 
  mutate(value_diff = avg_rev_value - avg_org_value)

# Calculating number of gardens
num_gardens = 
  property_data1 %>% 
  drop_na(garden_name) %>% 
  group_by(community_board) %>% 
  summarize(garden_num = n())

# Final economic dataset
property_data =
  left_join(property_data1, num_gardens, by = "community_board")

css_fix <- "div.info.legend.leaflet-control br {clear: both;}" # CSS to correct spacing
html_fix <- htmltools::tags$style(type = "text/css", css_fix)  # Convert CSS to HTML

# Creating color palette for map
pal <- colorNumeric(
  palette = "viridis",
  domain = property_data$avg_tot_appropriated)

# Creating leaflet
property_data %>%
  mutate(
    click_label =
      str_c("<b>", garden_name, "</b><br>Borough: ", borough,"</b><br>Community Borad: ", community_board, "</b><br>Budget: $", avg_tot_appropriated)) %>% 
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircleMarkers(~longitude, ~latitude, radius = .1, color = ~pal(avg_tot_appropriated), popup = ~click_label) %>% 
  addLegend("topleft", pal = pal, values = ~avg_tot_appropriated,
            title = "Budget Allocation",
            labFormat = labelFormat(prefix = "$"),
            opacity = 1) %<>% htmlwidgets::prependContent(html_fix)

To see how funding compares to the average market value, percentage poverty and rent burden, we created another set of leaflet maps to allow for further visual comparisons.

# Creating color palette for market value map
pal2 <- colorQuantile(
  palette = "viridis",
  domain = property_data$avg_rev_value)

# Creating quantiles for market value map (data is skewed)
property_data$quantile <- with(property_data, cut(avg_rev_value, 
                                breaks = quantile(avg_rev_value, probs = seq(0, 1, by = 0.25), na.rm = TRUE), 
                                include.lowest = TRUE))

# Creating market value map
property_data %>%
  mutate(
    click_label =
      str_c("<b>", garden_name, "</b><br>Borough: ", borough,"</b><br>Community Board: ", community_board, "</b><br>Market Value: $", avg_rev_value)) %>% 
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircleMarkers(~longitude, ~latitude, radius = .1, color = ~pal2(avg_rev_value), popup = ~click_label) %<>%
    addLegend("topleft", pal = pal2, values = ~avg_rev_value, 
               title = "Revised Market Value (Biannually)",
              opacity = 1,
              labFormat = function(type, cuts, p) {
      n = length(cuts)
      p = paste0(round(p * 100), '%')
      cuts = paste0(formatC(cuts[-n]), " - ", formatC(cuts[-1]))
      # mouse over the legend labels to see the percentile ranges
      paste0(
        '<span title="', p[-n], " - ", p[-1], '">', cuts,
        '</span>'
      )
    })
# Creating color palette for poverty map
pal3 <- colorNumeric(
  palette = "viridis",
  domain = property_data$poverty)

# Creating poverty map
property_data %>%
  mutate(
    click_label =
      str_c("<b>", garden_name, "</b><br>Borough: ", borough,"</b><br>Community Borad: ", community_board, "</b><br>Percent Poverty: ", poverty, "%")) %>% 
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircleMarkers(~longitude, ~latitude, radius = .1, color = ~pal3(poverty), popup = ~click_label) %>% 
    addLegend("topleft", pal = pal3, values = ~poverty,
            title = "Percent Poverty",
            labFormat = labelFormat(suffix = "%"),
            opacity = 1)
# Creating color palette for rent burden map
pal4 <- colorNumeric(
  palette = "viridis",
  domain = property_data$rent_burden)

# Creating rent burden map
property_data %>%
  mutate(
    click_label =
      str_c("<b>", garden_name, "</b><br>Borough: ", borough,"</b><br>Community Borad: ", community_board, "</b><br>Percent of Income Spent on Rent: ", rent_burden, "%")) %>% 
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircleMarkers(~longitude, ~latitude, radius = .1, color = ~pal4(rent_burden), popup = ~click_label) %>% 
    addLegend("topleft", pal = pal4, values = ~rent_burden,
            title = "Percent of Income Spent on Rent",
            labFormat = labelFormat(suffix = "%"),
            opacity = 1)

Looking at the budget and funding allocation map, we can see that most community boards receive around 150,000 to 350,000 dollars. Additionally, many areas with gardens do not receive any funding/lack available information. Simply looking at this distribution does not provide any clear association.

The market value map shows that average market value is highest in Manhattan, with other scattered areas in the Bronx and Brooklyn. The market value is much lower in areas further out from Manhattan. Percentage of adults in poverty is also highest in areas further from Manhattan, with highest percentages in the Bronx. Lastly, rent burden mimics the same pattern. People are spending more of their income on rent in the Bronx, Brooklyn and Queens.


Additional Analysis

To further analyze association with garden number and a variety of our health, demographic and economic variables, we created baseline linear models and then did spatial diagnostics to choose an appropriate spatial model.

First, we loaded the data and selected variables. We had to remove one community district that didn’t have any neighbors because in order to successfully run the spatial regression all community districts have to have neighbors.

analysis_data_final <-
  read_csv("./data/final_df.csv") %>% 
  select(id_spatial, community_board, obesity, hypertension, life_expectancy, self_rep_health, poverty, avg_rev_value, avg_tot_appropriated, borough, garden_num) %>% 
filter(!id_spatial %in% c('414'))
         
working <- getwd()
com_board_spdf <- readOGR(dsn = working, layer = "community_board_new", verbose = FALSE)
com_board_spdf2 <- com_board_spdf[!com_board_spdf@data$boro_cd %in% c("414"), ]
invisible(names(com_board_spdf))
analysis_data_spatial <- merge(com_board_spdf2, analysis_data_final, by.x = "boro_cd", by.y = "id_spatial")

Choosing Dependent Variables

To choose our dependent variables, we checked linearity of the relationship between garden number and potential outcomes. All outcomes have a somewhat linear association, however seems like the variance is not constant, which may lead to issues later. We tried transforming x (garden num) to log to see linear relationship but this resulted in an eventual error doing the spatial regression.

Predictor: Garden number

Confounder: Poverty

Outcomes: Average total appropriated, hypertension, life expectancy, obesity, self-reported health

Testing Spatial Autocorrelation

Check spatial dependence (non-random distribution) of variables using Moran’s I. A significant Moran’s I indicates the values are clustered and not spatially independent - therefore we should adjust for spatial parameter in our model.

Fiting Preliminary Models

  1. Life expectancy = garden_num + poverty
  2. Obesity = garden_num + poverty

Fitting Spatial Regression Model


Discussion